Indexing Issues in Supporting Similarity Searching
نویسنده
چکیده
Indexing issues that arise in the support of similarity searching are presented. This includes a discussion of the curse of dimensionality, as well as multidimensional indexing, distance-based indexing, dimension reduction, and embedding methods.
منابع مشابه
Conceptual Search Based on Semantic Relatedness
Traditional search engines based on syntactic search are unable to solve key issues like synonymy and polysemy. Solving these issues leads to the invention of the semantic web. The semantic search engines indeed overcome these issues. Nowadays the most important part of the data remains unstructured documents. It is consequently very time consuming to annotate such big data. Concept based retri...
متن کاملSearch Efficiency in Indexing Structures for Similarity Searching
Similarity searching finds application in a wide variety of domains including multilingual databases, computational biology, pattern recognition and text retrieval. Similarity is measured in terms of a distance function (edit distance) in general metric spaces, which is expensive to compute. Indexing techniques can be used reduce the number of distance computations. We present an analysis of va...
متن کاملIndexing and Searching Mathematics in Digital Libraries
This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware...
متن کاملEfficient Document Indexing Using Pivot Tree
We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bagof-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a met...
متن کاملGrouping and Indexing Color Features for Efficient Image Retrieval
Content-based image retrieval (CBIR) aims at searching image databases for specific images that are similar to a given query image based on matching of features derived from the image content. This paper focuses on a low-dimensional color based indexing technique for achieving efficient and effective retrieval performance. In our approach, the color features are extracted using the mean shift a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004